Add support for writing timestamps without timezone. #1
Add support for writing timestamps without timezone. #1
Conversation
This also adds AssertJ to testCompile in all modules so assertions can be used elsewhere.
* Spec: Add identifier-field-ids to schema. * Spec: Add section for partition evolution. * Spec: Add schemas list and current-schema-id to table metadata. * Spec: Add key_metadata to manifest list. * Spec: Add schema-id to Snapshot metadata.
`.withFailMessage(..)` was mistakenly used and was therefore overriding the actual error reporting, making debugging difficult.
…#2689) * support custom target name in partition spec builder * address the comments.
…org (apache#2709) Co-authored-by: tgooch <tgooch@netflix.com>
Param scanAllFiles Used to check whether all the data files should be processed, or only added files.Here we should replace scanAllFiles to !scanAllFiles.
Co-authored-by: Ryan Blue <blue@apache.org>
Co-authored-by: tgooch <tgooch@netflix.com>
| public static final String HANDLE_TIMESTAMP_WITHOUT_TIMEZONE_SESSION_PROPERTY = | ||
| "spark.sql.iceberg.handle-timestamp-without-timezone"; | ||
|
|
||
| public static final String STORE_TIMESTAMP_WITHOUT_TIMEZONE_SESSION_PROPERTY = |
There was a problem hiding this comment.
I think it's fine to just use one property for both reading and writing
There was a problem hiding this comment.
Actually there was incorrect name of this property, changed to READ_TIMESTAMP_AS_TIMESTAMP_WITHOUT_TIMEZONE.
This property is responsible for handling how we will represent spark TimestampType type in iceberg. By default spark TimestampType type will be converted to Types.TimestampType.withZone() iceberg type, but if we set READ_TIMESTAMP_AS_TIMESTAMP_WITHOUT_TIMEZONE to true, spark timestamp type will be converted to Types.TimestampType.withoutZone()
| */ | ||
| public static boolean hasTimestampWithoutZone(Schema schema) { | ||
| return TypeUtil.find(schema, t -> | ||
| t.typeId().equals(Type.TypeID.TIMESTAMP) && !((Types.TimestampType) t).shouldAdjustToUTC() |
There was a problem hiding this comment.
could just check against TimestampType.withoutZone() which returns the singleton instance
There was a problem hiding this comment.
good point, changed the code
| class SparkTypeToType extends SparkTypeVisitor<Type> { | ||
| private final StructType root; | ||
| private int nextId = 0; | ||
| private final boolean useTimestampWithoutZone; |
There was a problem hiding this comment.
I'm not sure what this flag is for, I think it's probably safer to always return timestamptype.withZone from here, and handle the mismatch outside of this code
There was a problem hiding this comment.
Moved this logic to SparkFixupTimestampType.java
|
|
||
| public class SparkUtil { | ||
|
|
||
| public static final String HANDLE_TIMESTAMP_WITHOUT_TIMEZONE_FLAG = "spark-handle-timestamp-without-timezone"; |
There was a problem hiding this comment.
Acutally we could probably just use the same session property here, so we just have one property
spark.sql.iceberg.convert-timestamp-without-timezone
Or something like that
| return SparkOrcValueWriters.decimal(primitive.getPrecision(), primitive.getScale()); | ||
| case TIMESTAMP_INSTANT: | ||
| case TIMESTAMP: | ||
| return SparkOrcValueWriters.timestampTz(); |
There was a problem hiding this comment.
Do we need corresponding code in the ParquetWriter as well?
|
|
||
| private StructType lazyType() { | ||
| if (type == null) { | ||
| Preconditions.checkArgument(readTimestampWithoutZone || !SparkUtil.hasTimestampWithoutZone(lazySchema()), |
There was a problem hiding this comment.
Actually cancel what I said about those other spots, this seems like a great place to check for whether we are allowed to do the TZ conversion
There was a problem hiding this comment.
ok, so we don't need any changes here?
…ted level should be one of the following: 6, 8. `
…comments. Fixed code formatting
e869a15 to
bc316c4
Compare
I thinks I need to close this PR, main PR is apache#2757 |
Add
spark.sql.iceberg.store-timestamp-without-zonespark config to indicate which iceberg type (Types.TimestampType.withZone() or Types.TimestampType.withoutZone()) will be used for sparkTimestampTypetype